Search results for "Thompson sampling"

showing 7 items of 7 documents

Thompson Sampling Based Active Learning in Probabilistic Programs with Application to Travel Time Estimation

2019

The pertinent problem of Traveling Time Estimation (TTE) is to estimate the travel time, given a start location and a destination, solely based on the coordinates of the points under consideration. This is typically solved by fitting a function based on a sequence of observations. However, it can be expensive or slow to obtain labeled data or measurements to calibrate the estimation function. Active Learning tries to alleviate this problem by actively selecting samples that minimize the total number of samples needed to do accurate inference. Probabilistic Programming Languages (PPL) give us the opportunities to apply powerful Bayesian inference to model problems that involve uncertainties.…

0106 biological sciencesEstimation0303 health sciencesSequenceActive learning (machine learning)business.industryComputer scienceProbabilistic logicInferenceFunction (mathematics)Bayesian inferenceMachine learningcomputer.software_genre010603 evolutionary biology01 natural sciences03 medical and health sciencesArtificial intelligencebusinesscomputerThompson sampling030304 developmental biology

researchProduct

Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning

2015

This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…

Adversarial systemComputer scienceProperty (programming)business.industryProcess (computing)Reinforcement learningArtificial intelligencebusinessRepresentation (mathematics)Bayesian inferenceMulti-armed banditThompson sampling2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

researchProduct

Thompson Sampling for Dynamic Multi-armed Bandits

2011

The importance of multi-armed bandit (MAB) problems is on the rise due to their recent application in a large variety of areas such as online advertising, news article selection, wireless networks, and medicinal trials, to name a few. The most common assumption made when solving such MAB problems is that the unknown reward probability theta k of each bandit arm k is fixed. However, this assumption rarely holds in practice simply because real-life problems often involve underlying processes that are dynamically evolving. In this paper, we model problems where reward probabilities theta k are drifting, and introduce a new method called Dynamic Thompson Sampling (DTS) that facilitates Order St…

Computer Science::Machine LearningMathematical optimizationbusiness.industryComputer scienceOrder statisticBayesian probabilitySampling (statistics)RegretArtificial intelligencebusinessThompson samplingRandom variableSelection (genetic algorithm)2011 10th International Conference on Machine Learning and Applications and Workshops

researchProduct

Successive Reduction of Arms in Multi-Armed Bandits

2011

The relevance of the multi-armed bandit problem has risen in the past few years with the need for online optimization techniques in Internet systems, such as online advertisement and news article recommendation. At the same time, these applications reveal that state-of-the-art solution schemes do not scale well with the number of bandit arms. In this paper, we present two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO). Both use an Order Statistics based Thompson Sampling method for arm selection, and then successively eliminate bandit arms from consideration based on a confidence threshold. While SRH…

Reduction (complexity)Mathematical optimizationComputer scienceOrder statisticScalabilitySampling (statistics)Pairwise comparisonScale (descriptive set theory)Thompson samplingSelection (genetic algorithm)

researchProduct

Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

2010

Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-13033-5_21 The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Alt…

Scheme (programming language)Mathematical optimizationOptimization problemComputer scienceBayesian probabilityVDP::Technology: 500::Information and communication technology: 550Kalman filterBayesian inferenceMulti-armed banditVDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425computerThompson samplingOptimal decisioncomputer.programming_language

researchProduct

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

2015

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only tol…

Scheme (programming language)business.industryComputer scienceBayesian probabilityBayesian inferenceMulti-armed banditLine (geometry)Reinforcement learningArtificial intelligenceRepresentation (mathematics)businessThompson samplingcomputercomputer.programming_language

researchProduct

Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-armed Bandit Problems

2013

Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this…

symbols.namesakeMathematical optimizationComputer scienceNash equilibriumMulti-agent systemsymbolsSampling (statistics)Game theoryThompson samplingMulti-armed bandit2013 12th International Conference on Machine Learning and Applications

researchProduct